Search CORE

69 research outputs found

Dissecting Arbitrary-scale Super-resolution Capability from Pre-trained Diffusion Generative Models

Author: Guo Jingcai
Guo Song
Han Zhenhua
Jiang Xinyang
Li Ruibin
Shen Yifei
Zhang Jie
Zhou Qihua
Publication venue
Publication date: 01/06/2023
Field of study

Diffusion-based Generative Models (DGMs) have achieved unparalleled performance in synthesizing high-quality visual content, opening up the opportunity to improve image super-resolution (SR) tasks. Recent solutions for these tasks often train architecture-specific DGMs from scratch, or require iterative fine-tuning and distillation on pre-trained DGMs, both of which take considerable time and hardware investments. More seriously, since the DGMs are established with a discrete pre-defined upsampling scale, they cannot well match the emerging requirements of arbitrary-scale super-resolution (ASSR), where a unified model adapts to arbitrary upsampling scales, instead of preparing a series of distinct models for each case. These limitations beg an intriguing question: can we identify the ASSR capability of existing pre-trained DGMs without the need for distillation or fine-tuning? In this paper, we take a step towards resolving this matter by proposing Diff-SR, a first ASSR attempt based solely on pre-trained DGMs, without additional training efforts. It is motivated by an exciting finding that a simple methodology, which first injects a specific amount of noise into the low-resolution images before invoking a DGM's backward diffusion process, outperforms current leading solutions. The key insight is determining a suitable amount of noise to inject, i.e., small amounts lead to poor low-level fidelity, while over-large amounts degrade the high-level signature. Through a finely-grained theoretical analysis, we propose the Perceptual Recoverable Field (PRF), a metric that achieves the optimal trade-off between these two factors. Extensive experiments verify the effectiveness, flexibility, and adaptability of Diff-SR, demonstrating superior performance to state-of-the-art solutions under diverse ASSR environments

arXiv.org e-Print Archive

Spatio-Temporal Calibration for Omni-Directional Vehicle-Mounted

Author: Guo Ruibin
Li Xiao
Lu Huimin
Peng Xin
Zhou Yi
Zhou Zongtan
Publication venue
Publication date: 13/07/2023
Field of study

We present a solution to the problem of spatio-temporal calibration for event cameras mounted on an onmi-directional vehicle. Different from traditional methods that typically determine the camera's pose with respect to the vehicle's body frame using alignment of trajectories, our approach leverages the kinematic correlation of two sets of linear velocity estimates from event data and wheel odometers, respectively. The overall calibration task consists of estimating the underlying temporal offset between the two heterogeneous sensors, and furthermore, recovering the extrinsic rotation that defines the linear relationship between the two sets of velocity estimates. The first sub-problem is formulated as an optimization one, which looks for the optimal temporal offset that maximizes a correlation measurement invariant to arbitrary linear transformation. Once the temporal offset is compensated, the extrinsic rotation can be worked out with an iterative closed-form solver that incrementally registers associated linear velocity estimates. The proposed algorithm is proved effective on both synthetic data and real data, outperforming traditional methods based on alignment of trajectories

arXiv.org e-Print Archive

Research for Inertia Response and Primary Frequency Regulation Ability of Wind Turbine

Author: Jiangtao GUO
Liling HUANG
Ruibin ZENG
Shuo CHEN
Yifeng ZHANG
Publication venue: Energy Observer Magazine Co., Ltd.
Publication date: 01/07/2023
Field of study

[Introduction] Large-scale connection of wind power to the power grid poses great challenges to the stability (especially frequency stability) of grid operation.In order to solve the problem of inadequate frequency regulation capability caused by large-scale connection of wind power to the power grid and improve the frequency adaptability of wind power grid connection, wind turbines need to have frequency regulation function and response timeliness. [Method] This paper adopted a frequency regulation system scheme based on rotor kinetic energy and pitch angle reserve, which could provide active support for the power grid quickly and accurately during the power grid frequency change. Firstly, the main control algorithm was designed based on the theoretical analysis of inertia response and primary frequency regulation algorithm logic. Then, the functional verification was carried out on the co-simulation platform. Finally, the actual test was carried out in a project.[Result] The simulation and test results showed that the frequency regulation system scheme based on rotor kinetic energy and pitch angle reserve could cope with a variety of grid frequency changes and quickly provided active support. [Conclusion] The frequency regulation system scheme of wind turbines can perform a fast inertia response (with the response time less than 500 ms) and primary frequency regulation response (with the response time less than 5 s) under various frequency change conditions and provide active support for the power grid, which can help recover the grid frequency and effectively improve the frequency adaptability of wind turbines

Directory of Open Access Journals

Uremia toxin helps to induce inflammation in intestines by activating the ATM/NEMO/ NF-B signalling pathway in human intestinal epithelial cells

Author: Guo Feng
Wang Lihui
Xue Xia
Yang Ruihong
Zhang Ruibin
Publication venue: NISCAIR-CSIR, India
Publication date: 30/09/2020
Field of study

638-642During progressive chronic kidney disease, toxic substances known as uremic toxins accumulate in body fluids. Uremia toxin has been documented to be involved in most inflammatory reactions, and indoxyl-sulfate (IS) a major serum metabolite of uremia is a key player in this. The mechanism by which uremia toxin establishes it inflammatory activity is scarcely known; however, researchers believes that a clear understanding of this process can serve as a guide to combat the situation. The study was designed to investigate the role played by uremia toxin in intestinal inflammation. SW480 was used as cell lines for this study. Luciferase assay was used to detect the cell viability of different concentrations of IS. RT-qPCR was used to detect the effect of IS on the expression of inflammatory factors. The comet assay was used as a tool to detect DNA damage. Western blot was used to detect the phosphorylation level of ATM/NEMO/NF-kB protein. The IS of 0.09 nM was determined to be the best experimental concentration by luciferase assay. Result showed that IS promotes the expression of inflammatory factors TNF-α and IL-6. In addition, IS led to enhanced DNA damage in cells. IS promoted ATM phosphorylation leading to phosphorylation of NEMO to activate the NF-kB signalling pathway. In conclusion, uremia toxin facilitates inflammation in intestines by activating the ATM/NEMO/ NF-kB signalling pathway in human intestinal epithelial cells

Online Publishing @ NISCAIR

NOPR

Uremia toxin helps to induce inflammation in intestines by activating the ATM/NEMO/NF-kB signalling pathway in human intestinal epithelial cells

Author: Guo Feng
Wang Lihui
Xue Xia
Yang Ruihong
Zhang Ruibin
Publication venue: Indian Journal of Biochemistry and Biophysics (IJBB)
Publication date: 30/09/2020
Field of study

During progressive chronic kidney disease, toxic substances known as uremic toxins accumulate in body fluids. Uremia toxin has been documented to be involved in most inflammatory reactions, and indoxyl-sulfate (IS) a major serum metabolite of uremia is a key player in this. The mechanism by which uremia toxin establishes it inflammatory activity is scarcely known; however, researchers believes that a clear understanding of this process can serve as a guide to combat the situation. The study was designed to investigate the role played by uremia toxin in intestinal inflammation. SW480 was used as cell lines for this study. Luciferase assay was used to detect the cell viability of different concentrations of IS. RT-qPCR was used to detect the effect of IS on the expression of inflammatory factors. The comet assay was used as a tool to detect DNA damage. Western blot was used to detect the phosphorylation level of ATM/NEMO/NF-kB protein. The IS of 0.09 nM was determined to be the best experimental concentration by luciferase assay. Result showed that IS promotes the expression of inflammatory factors TNF-α and IL-6. In addition, IS led to enhanced DNA damage in cells. IS promoted ATM phosphorylation leading to phosphorylation of NEMO to activate the NF-kB signalling pathway. In conclusion, uremia toxin facilitates inflammation in intestines by activating the ATM/NEMO/ NF-kB signalling pathway in human intestinal epithelial cells

Online Publishing @ NISCAIR

LyricWhiz: Robust Multilingual Zero-shot Lyrics Transcription by Whispering to ChatGPT

Author: Benetos Emmanouil
Chen Wenhu
Dannenberg Roger
Fu Jie
Guo Yike
LI Yizhi
Lin Chenghua
Liu Si
Ma Yinghao
Pan Jiahao
Xue Wei
Yuan Ruibin
Zhang Ge
Zhuo Le
Publication venue
Publication date: 29/06/2023
Field of study

We introduce LyricWhiz, a robust, multilingual, and zero-shot automatic lyrics transcription method achieving state-of-the-art performance on various lyrics transcription datasets, even in challenging genres such as rock and metal. Our novel, training-free approach utilizes Whisper, a weakly supervised robust speech recognition model, and GPT-4, today's most performant chat-based large language model. In the proposed method, Whisper functions as the "ear" by transcribing the audio, while GPT-4 serves as the "brain," acting as an annotator with a strong performance for contextualized output selection and correction. Our experiments show that LyricWhiz significantly reduces Word Error Rate compared to existing methods in English and can effectively transcribe lyrics across multiple languages. Furthermore, we use LyricWhiz to create the first publicly available, large-scale, multilingual lyrics transcription dataset with a CC-BY-NC-SA copyright license, based on MTG-Jamendo, and offer a human-annotated subset for noise level estimation and evaluation. We anticipate that our proposed method and dataset will advance the development of multilingual lyrics transcription, a challenging and emerging task.Comment: 9 pages, 2 figures, 5 tables, accepted by ISMIR 202

arXiv.org e-Print Archive

On the Effectiveness of Speech Self-supervised Learning for Music

Author: Benetos Emmanouil
Chen Xingran
Dannenberg Roger
Fu Jie
Guo Yike
Gyenge Norbert
Li Yizhi
Lin Chenghua
Liu Ruibo
Ma Yinghao
Ragni Anton
Xia Gus
Yin Hanzhi
Yuan Ruibin
Zhang Ge
Publication venue
Publication date: 11/07/2023
Field of study

Self-supervised learning (SSL) has shown promising results in various speech and natural language processing applications. However, its efficacy in music information retrieval (MIR) still remains largely unexplored. While previous SSL models pre-trained on music recordings may have been mostly closed-sourced, recent speech models such as wav2vec2.0 have shown promise in music modelling. Nevertheless, research exploring the effectiveness of applying speech SSL models to music recordings has been limited. We explore the music adaption of SSL with two distinctive speech-related models, data2vec1.0 and Hubert, and refer to them as music2vec and musicHuBERT, respectively. We train

12

SSL models with 95M parameters under various pre-training configurations and systematically evaluate the MIR task performances with 13 different MIR tasks. Our findings suggest that training with music data can generally improve performance on MIR tasks, even when models are trained using paradigms designed for speech. However, we identify the limitations of such existing speech-oriented designs, especially in modelling polyphonic information. Based on the experimental results, empirical suggestions are also given for designing future musical SSL strategies and paradigms

arXiv.org e-Print Archive

MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training

Author: Benetos Emmanouil
Chen Wenhu
Chen Xingran
Dannenberg Roger
Fu Jie
Guo Yike
Gyenge Norbert
Huang Wenhao
Li Yizhi
Lin Chenghua
Liu Ruibo
Ma Yinghao
Ragni Anton
Shi Yemin
Xia Gus
Yin Hanzhi
Yuan Ruibin
Zhang Ge
Publication venue
Publication date: 31/05/2023
Field of study

Self-supervised learning (SSL) has recently emerged as a promising paradigm for training generalisable models on large-scale data in the fields of vision, text, and speech. Although SSL has been proven effective in speech and audio, its application to music audio has yet to be thoroughly explored. This is primarily due to the distinctive challenges associated with modelling musical knowledge, particularly its tonal and pitched characteristics of music. To address this research gap, we propose an acoustic Music undERstanding model with large-scale self-supervised Training (MERT), which incorporates teacher models to provide pseudo labels in the masked language modelling (MLM) style acoustic pre-training. In our exploration, we identified a superior combination of teacher models, which outperforms conventional speech and audio approaches in terms of performance. This combination includes an acoustic teacher based on Residual Vector Quantization - Variational AutoEncoder (RVQ-VAE) and a musical teacher based on the Constant-Q Transform (CQT). These teachers effectively guide our student model, a BERT-style transformer encoder, to better model music audio. In addition, we introduce an in-batch noise mixture augmentation to enhance the representation robustness. Furthermore, we explore a wide range of settings to overcome the instability in acoustic language model pre-training, which allows our designed paradigm to scale from 95M to 330M parameters. Experimental results indicate that our model can generalise and perform well on 14 music understanding tasks and attains state-of-the-art (SOTA) overall scores. The code and models are online: https://github.com/yizhilll/MERT

arXiv.org e-Print Archive

Robust Visual Compass Using Hybrid Features for Indoor Environments

Author: Dongxiang Zhou
Keju Peng
Ruibin Guo
Yunhui Liu
Publication venue: 'MDPI AG'
Publication date: 01/02/2019
Field of study

Orientation estimation is a crucial part of robotics tasks such as motion control, autonomous navigation, and 3D mapping. In this paper, we propose a robust visual-based method to estimate robots’ drift-free orientation with RGB-D cameras. First, we detect and track hybrid features (i.e., plane, line, and point) from color and depth images, which provides reliable constraints even in uncharacteristic environments with low texture or no consistent lines. Then, we construct a cost function based on these features and, by minimizing this function, we obtain the accurate rotation matrix of each captured frame with respect to its reference keyframe. Furthermore, we present a vanishing direction-estimation method to extract the Manhattan World (MW) axes; by aligning the current MW axes with the global MW axes, we refine the aforementioned rotation matrix of each keyframe and achieve drift-free orientation. Experiments on public RGB-D datasets demonstrate the robustness and accuracy of the proposed algorithm for orientation estimation. In addition, we have applied our proposed visual compass to pose estimation, and the evaluation on public sequences shows improved accuracy

Directory of Open Access Journals

A Hypergraph Matching Labeled Multi-Bernoulli Filter for Group Targets Tracking

Author: Haoyang YU
Ran ZHU
Ruibin GUO
Wei AN
Publication venue: 'Institute of Electronics, Information and Communications Engineers (IEICE)'
Publication date
Field of study

Crossref